Now that you have your data ready, you are curious about the neighborhoods. The areas with the highest and lowest median values, as well as the areas with the best and worst value changes since 2010 - they are important to you, naturally!
In order to tap into four-year neighborhood housing value changes, you merged ACS neighborhood data from the 2005-2010 five-year estimates. Because the Census Bureau uses neighborhood samples of residents and households, it discourages any cross-time comparisons shorter than five years at a time. Also, while the Census Bureau changes some of its neighborhoods every decade, the data used here arise from the same maps and thus provide the best apples-to-apples comparison of neighborhoods.
Before going ahead, it is important to prepare the data a little further. For various reasons, nine census tracts do not have a median housing value; thus, we must remove them from our analysis before we begin to show any correlations. So, let’s do this!
#################################
#
setwd('C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data')
load("ACSData9_14.RData")
# Noticing the nine missing values
table(is.na(ACSData9_14$MedHouseVal))
##
## FALSE TRUE
## 578 9
# Removing the missing values
test <- subset(ACSData9_14, is.na(MedHouseVal))
ACSData9_14nona <- subset(ACSData9_14, !is.na(MedHouseVal))
#Now no NA's in median housing value
table(is.na(ACSData9_14nona$MedHouseVal))
##
## FALSE
## 578
library(maptools)
library(rgdal)
library(sp)
library(ggplot2)
library(ggmap)
library(digest)
library(Hmisc)
gpclibPermit()
## [1] FALSE
library(psych)
library(ggthemes)
library(plyr)
library(dplyr)
library(RColorBrewer)
library(colorspace)
Because you are curious about housing values in the neighborhoods, you shrink the data to only the variables of interest and then arrange the data as you wish. Then, for the sake of exploring the margins, you decide to limit the results to the top twenty census tracts. (You also inspect the data frame to make sure it is ready.)
Before digging in deeper, you take a peek at housing values in general.
ACSData9_14nona.mhv <-ACSData9_14nona[c("Tract",
"JR_Name",
"County",
"MedHouseVal",
"MedHouseVal_10",
"HV_4YrChg")]
class(ACSData9_14nona.mhv)
## [1] "data.frame"
class(ACSData9_14nona.mhv$MedHouseVal)
## [1] "numeric"
summary(ACSData9_14nona$MedHouseVal)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 27500 181900 243100 272200 331000 1000000
summary(ACSData9_14nona$MedHouseVal_10)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19200 188700 239300 274800 325000 1000000
summary(ACSData9_14nona$HV_4YrChg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -44.300 -7.200 -1.700 -1.031 3.575 199.300
library(plotly)
library(ggplot2)
qplot(x = MedHouseVal, data = ACSData9_14, binwidth = 10000,
xlab = 'Median housing value',
ylab = 'Number of neighborhoods',
color = MedHouseVal)
ACSData9_14nona$MedHouseVal2 <-ACSData9_14nona$MedHouseVal/1000
t <- ggplot(ACSData9_14nona, aes(x=County, y=MedHouseVal2)) + stat_summary(fun.y="median", geom="bar", fill="darkgreen", color="lightgreen")
t + ylab("Middle value") + xlab("County") +
labs(title = "Median Housing by County") +
geom_hline(yintercept=seq(0, 380, by=10), alpha=0.10)
aggdata <-aggregate(ACSData9_14nona$MedHouseVal, by=list(ACSData9_14nona$County),
FUN=median, na.rm=TRUE)
aggdata
## Group.1 x
## 1 Adams 179350
## 2 Arapahoe 221700
## 3 Denver 262700
## 4 Douglas 334400
## 5 Jefferson 257700
summary(ACSData9_14nona$MedHouseVal, na.rm=TRUE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 27500 181900 243100 272200 331000 1000000
You notice, first, the fairly bell-shaped distribution to the data. Many neighborhoods have housing values ranging from $150K to $350K. Some neighborhoods south of the city of Denver have much higher values, some higher than the Bureau’s survey can detect.
You also notice the disparity among counties, with Douglas the highest ($334,000) and Adams the lowest ($179,350), with the other three close to the Denver median ($243,100).
So, which areas have the highest median housing values in 2014?
ACSData9_14nona.mhv %>%
arrange(-MedHouseVal) %>%
subset(MedHouseVal > 600000, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
## Tract JR_Name County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1 56.36 Greenwood V. Arapahoe 1000000 1000000 0.0
## 2 67.04 Cherry Hills V. Arapahoe 1000000 1000000 0.0
## 3 67.05 Cherry Hills V. Arapahoe 1000000 1000000 0.0
## 4 67.12 Greenwood V. Arapahoe 920800 922900 -0.2
## 5 43.03 Hilltop Denver 839000 812700 3.2
## 6 56.12 GV/Little/Engle Arapahoe 833200 779000 7.0
## 7 40.06 Univ. Pk Denver 736200 707800 4.0
## 8 32.03 Country Club Denver 729600 723100 0.9
## 9 141.23 Castle Pines/CR Douglas 714300 805700 -11.3
## 10 68.57 Greenwood V. Arapahoe 700400 649200 7.9
## 11 141.16 Lone Tree/Other Douglas 697200 627500 11.1
## 12 120.34 Other Jefferson 697200 751400 -7.2
## 13 56.22 Littleton/CV Arapahoe 689900 759800 -9.2
## 14 39.01 Belcaro Denver 681900 579300 17.7
## 15 141.35 Highlnds Rch/Lo/S Douglas 642600 649000 -1.0
## 16 98.48 Evergreen Jefferson 621200 707900 -12.2
## 17 38 Cherry Creek D. Denver 605400 647500 -6.5
## 18 34.02 Wash. Pk Denver 603500 540500 11.7
## 19 68.08 Cherry Creek Arapahoe 602900 605500 -0.4
## 20 850 Centennial/? Arapahoe 600900 596800 0.7
First, you discover a high proportion (45%) of the tracts exists in Arapahoe County. In fact, four of them exist very closely to each other in the Cherry Hills Village and Greenwood Village areas! The top three, in fact, are capped off at $1,000,000 due to the Census Bureau survey question. It seems that what you heard about this part of Denver is, in fact, true.
Second, Denver County itself has a respectable percentage of areas as well (30%). Third, and interestingly, Adams County has no top-twenty neighborhoods.
Which areas have the lowest median value in 2014?
ACSData9_14nona.mhv %>%
arrange(MedHouseVal) %>%
subset(MedHouseVal < 108800, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
## Tract JR_Name County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1 81 Aurora Adams 27500 19200 43.2
## 2 93.16 Federal Hgts/T Adams 28500 34100 -16.4
## 3 93.21 Federal Hgts Adams 35400 32200 9.9
## 4 83.09 Aurora Adams 35900 29900 20.1
## 5 150 N. Wash./W Adams 38100 55800 -31.7
## 6 93.2 Federal Hgts/W Adams 46400 71600 -35.2
## 7 91.03 Thornton Adams 52400 24300 115.6
## 8 93.19 Federal Hgts Adams 52600 69500 -24.3
## 9 83.08 Aurora Adams 65000 116800 -44.3
## 10 820 Aurora Arapahoe 69700 110400 -36.9
## 11 93.22 Thornton Adams 80800 72500 11.4
## 12 70.89 Windsor Denver 96700 106200 -8.9
## 13 93.18 Thornton/FH Adams 97900 102600 -4.6
## 14 35 Elaria Swanesea Denver 98100 152800 -35.8
## 15 871 Aurora Arapahoe 100800 150200 -32.9
## 16 77.04 Aurora Arapahoe 101800 115800 -12.1
## 17 826 Aurora Arapahoe 104400 151400 -31.0
## 18 72.02 Aurora Arapahoe 104600 150600 -30.5
## 19 88.02 Derby Adams 106300 120200 -11.6
## 20 55.52 Sheridan Arapahoe 108700 113000 -3.8
Adams County dominates this list at 60% of all tracts including the top nine! Why might this be the case, you wonder?
You see some areas in particular: Aurora most frequently, Federal Heights, and Thornton come up quickly. You are curious to know why this is the case but choose to explore later.
You also notice what is absent here: neither Jefferson nor Douglas Counties have a city on this particular list.
You also want to have a handle on changes in housing values from 2010 to 2014. You re-arrange the data to sort by four-year changes in median housing values.
Which areas grew the most in housing values from 2010 to 2014?
qplot(x = HV_4YrChg, data = ACSData9_14, binwidth = 5,
xlab = 'Four year change, median housing value',
ylab = 'Number of neighborhoods')
describe(ACSData9_14$HV_4YrChg)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 578 -1.03 14.69 -1.7 -1.68 8.08 -44.3 199.3 243.6 5.43 66.02
## se
## 1 0.61
summary(ACSData9_14$HV_4YrChg)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -44.300 -7.200 -1.700 -1.031 3.575 199.300 9
ACSData9_14nona.mhv$MedHouseVal_10_2 <-ACSData9_14nona$MedHouseVal_10/1000
n <-ACSData9_14nona.mhv %>%
filter(MedHouseVal_10 < 1000000 & HV_4YrChg < 100) %>%
ggplot(aes(x = MedHouseVal_10_2, y = HV_4YrChg)) +
geom_point(alpha = 0.7) + stat_smooth(method = "lm")
n + ylab("Four-year change") + xlab("Median housing value") +
labs(title = "Four-year change, median housing (2010-2014)") +
geom_hline(yintercept=seq(0, 20, by=10), alpha=0.10)
You notice in the second graph that the middle value of a neighborhood is far from uniform. Some areas experienced high growth rates - one as high as almost 200%, in fact! - likely due to its status as a new development area. You discover, overall, a slight decline (mean of -1.03, median of -1.7)! The dispersion of values - and the generally flat line - in the second graph suggest there is no uniform trend of certain housing brackets weathering the market better than others.
While it is outside the scope of the project to explore this further at this time, it is nonetheless interesting to note.
You are curious, nonetheless, so you make separate lists of highest-growth and lowest-growth neighborhoods. Where are the highest-growth areas?
ACSData9_14nona.mhv %>%
arrange(-HV_4YrChg) %>%
subset(HV_4YrChg >= 20.0, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
## Tract JR_Name County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1 97.51 Berkley Adams 133200 44500 199.3
## 2 91.03 Thornton Adams 52400 24300 115.6
## 3 27.01 Capitol Hill Denver 272000 168500 61.4
## 4 872 Other Arapahoe 453700 288100 57.5
## 5 81 Aurora Adams 27500 19200 43.2
## 6 11.02 Highland Denver 339700 251600 35.0
## 7 40.04 Smoor Pk Denver 346800 261300 32.7
## 8 155 Virginia V. Denver 290600 219400 32.5
## 9 6 Jefferson Pk Denver 278300 210300 32.3
## 10 869 Other Arapahoe 220100 168200 30.9
## 11 68.58 Greenwood V. Arapahoe 259500 199300 30.2
## 12 5.01 Sloan Lake Denver 358300 287100 24.8
## 13 21 Baker Denver 302000 245600 23.0
## 14 70.37 Windsor Denver 281200 228700 23.0
## 15 29.02 Wash. Pk W Denver 484800 396200 22.4
## 16 24.02 Five Points Denver 336400 276200 21.8
## 17 4.01 Sunnyside Denver 315300 261900 20.4
## 18 3.01 Berkeley Denver 330500 274900 20.2
## 19 83.09 Aurora Adams 35900 29900 20.1
## 20 142.04 Other/Roxbrgh Pk Douglas 282800 235600 20.0
Two areas in Adams County grew very quickly during this period: a tract in Berkley (199%) and one in Thornton (115%). You surmise that these might be newer developments, getting their footing comparatively late. Both areas have lower-than-average values, so this might be the case.
You also notice Denver County tracts dominate this list - with 60% of the highest-growth tracts - while Jefferson County tracts appear nowhere. Is Jefferson a more mature market?
Among areas with the highest growth, there appears to be a healthy spread of housing values. Stated another way, while some of the hottest areas began as low-value areas in 2010 (certain tracts in Berkley, Thornton, Aurora), other tracts with very respectable growth exist with fairly high housing values. The Highland neighborhood in Denver County, for example, started at $251,600 in 2010 but jumped by 35% to $339,700 four years later. The Washington Park West neighborhood had a similar pattern, jumping 22.4% up to $484,800 in 2014.
You explore a little further, by reviewing neighborhoods with the steepest declines. What do you know about these areas?
ACSData9_14nona.mhv %>%
arrange(HV_4YrChg) %>%
subset(HV_4YrChg < -22.0, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
## Tract JR_Name County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1 83.08 Aurora Adams 65000 116800 -44.3
## 2 820 Aurora Arapahoe 69700 110400 -36.9
## 3 35 Elaria Swanesea Denver 98100 152800 -35.8
## 4 93.2 Federal Hgts/W Adams 46400 71600 -35.2
## 5 871 Aurora Arapahoe 100800 150200 -32.9
## 6 150 N. Wash./W Adams 38100 55800 -31.7
## 7 826 Aurora Arapahoe 104400 151400 -31.0
## 8 72.02 Aurora Arapahoe 104600 150600 -30.5
## 9 15 Globeville Denver 116500 164200 -29.0
## 10 55.51 Sheridan/E Arapahoe 123000 170700 -27.9
## 11 67.01 Smoor Pk Denver 421600 575700 -26.8
## 12 83.05 Montbello Denver 111100 151500 -26.7
## 13 49.51 Glendale Arapahoe 118900 159100 -25.3
## 14 87.05 Commerce City Adams 123700 164200 -24.7
## 15 32.02 Cheesman Pk Denver 270300 357600 -24.4
## 16 93.19 Federal Hgts Adams 52600 69500 -24.3
## 17 45.06 Westwood Denver 115600 151000 -23.4
## 18 865 Aurora Arapahoe 303400 393000 -22.8
## 19 87.06 Commerce City Adams 117600 152000 -22.6
## 20 158 Lakewood Jefferson 215400 277600 -22.4
Three counties (Adams, Arapahoe, Denver) comprise almost equally the vast majority of such areas (95%), with one in Jefferson. You also discover this leans heavily - 80 percent - toward houses in the lower quarter of the value range (below $188,900), with most (15%) in the higher quarter (above $325,000).
You are also curious to see how the higher-value areas have performed since 2010, so you choose areas at the top tenth minimum ($444,000) and drop the highest areas ($1,000,000) for the graph because you do not have a reliable metric for four-year change (as noted above).
ACSData9_14nona.mhv %>%
arrange(-MedHouseVal_10) %>%
subset(MedHouseVal_10 >= 444000 & MedHouseVal_10 < 1000000, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
## Tract JR_Name County MedHouseVal MedHouseVal_10 HV_4YrChg
## 4 67.12 Greenwood V. Arapahoe 920800 922900 -0.2
## 5 43.03 Hilltop Denver 839000 812700 3.2
## 6 141.23 Castle Pines/CR Douglas 714300 805700 -11.3
## 7 56.12 GV/Little/Engle Arapahoe 833200 779000 7.0
## 8 56.22 Littleton/CV Arapahoe 689900 759800 -9.2
## 9 120.34 Other Jefferson 697200 751400 -7.2
## 10 32.03 Country Club Denver 729600 723100 0.9
## 11 98.48 Evergreen Jefferson 621200 707900 -12.2
## 12 40.06 Univ. Pk Denver 736200 707800 4.0
## 13 68.57 Greenwood V. Arapahoe 700400 649200 7.9
## 14 141.35 Highlnds Rch/Lo/S Douglas 642600 649000 -1.0
## 15 38 Cherry Creek D. Denver 605400 647500 -6.5
## 16 141.16 Lone Tree/Other Douglas 697200 627500 11.1
## 17 141.22 Castle Pines/CPN Douglas 589600 616600 -4.4
## 18 146.02 Other/Franktown Douglas 564000 609900 -7.5
## 19 140.13 Castle Rock/TP/P Douglas 507300 608000 -16.6
## 20 68.08 Cherry Creek Arapahoe 602900 605500 -0.4
## 21 144.03 Castle Rock/L Douglas 533500 601500 -11.3
## 22 141.25 Lone Tree/CPN Douglas 527500 598800 -11.9
## 23 850 Centennial/? Arapahoe 600900 596800 0.7
## 24 98.37 Arvada/Other Jefferson 572900 593600 -3.5
## 25 864 Centennial/A Arapahoe 580000 584100 -0.7
## 26 98.45 Genesee/I Jefferson 592500 580200 2.1
## 27 39.01 Belcaro Denver 681900 579300 17.7
## 28 67.01 Smoor Pk Denver 421600 575700 -26.8
## 29 98.5 Other/Golden Jefferson 559400 570800 -2.0
## 30 141.31 Highlnds Rch/L Douglas 522800 570700 -8.4
## 31 56.29 Centennial Arapahoe 597100 552900 8.0
## 32 34.02 Wash. Pk Denver 603500 540500 11.7
## 33 145.06 Castle Rock Douglas 430500 529900 -18.8
## 34 19.02 Auraria Denver 450000 513900 -12.4
## 35 44.05 Lowry Field Denver 454800 511500 -11.1
## 36 98.42 Golden/P Jefferson 437000 500300 -12.7
## 37 98.46 Kittredge/E Jefferson 521400 499100 4.5
## 38 120.35 Other Jefferson 499800 495700 0.8
## 39 40.02 Wellshire Denver 503400 487900 3.2
## 40 17.01 Union Stn (LoDo) Denver 472000 483800 -2.4
## 41 34.01 Wash. Pk Denver 475300 480200 -1.0
## 42 33 Congress Pk Denver 496200 479600 3.5
## 43 853 Aurora Arapahoe 439800 474300 -7.3
## 44 43.06 Hilltop/LF Denver 461700 474200 -2.6
## 45 139.09 Franktown Douglas 442000 472700 -6.5
## 46 85.41 Thornton/TC Adams 432700 472400 -8.4
## 47 120.5 Other/Bow Mar Jefferson 450300 468700 -3.9
## 48 56.21 Columbine/Li Arapahoe 439400 464900 -5.5
## 49 867 Aurora/Other Arapahoe 425100 459600 -7.5
## 50 41.06 Stapleton Denver 457800 458600 -0.2
## 51 141.27 Castle Pines N. Douglas 441200 458600 -3.8
## 52 142.02 Other/Roxbrgh Pk Douglas 462300 454300 1.8
## 53 141.32 Highlnds Rch Douglas 447400 453700 -1.4
## 54 98.47 Evergreen Jefferson 478600 450200 6.3
## 55 120.36 Littleton Jefferson 449600 447300 0.5
## 56 851 Other Arapahoe 405500 446900 -9.3
## 57 68.55 Centennial/CCD Arapahoe 450100 445800 1.0
## 58 139.01 Aurora/Other Douglas 390800 444000 -12.0
o <-ACSData9_14nona.mhv %>%
filter(MedHouseVal_10 >= 444000 & MedHouseVal_10 < 1000000) %>%
ggplot(aes(x = MedHouseVal_10_2, y = HV_4YrChg)) +
geom_point(alpha = 0.7) + stat_smooth(method = "lm")
o + ylab("Four-year change") + xlab("Median housing value, 2010") +
labs(title = "Four-year change, median housing (2010-2014)") +
geom_hline(yintercept=seq(0, 20, by=10), alpha=0.10)
Douglas and Denver counties hold a majority of the high-value neighborhoods (54%), with Arapahoe and Jefferson next (23% and 20%, respectively). Adams County only has one area, in Thornton/Todd Creek ($472,400 in 2010).
More importantly for you, you notice the majority of such neighborhoods experiencing some decline in value (65%).
In order to understand better any potential explanations for housing values, you provide point graphs along with estimated regression lines (to highlight the best estimated pattern in the data).
For the sake of brevity, you choose to select a subset of interesting variables and display them in a series of correlations and some interesting graphs. In this way, you can begin to understand what factors appear to affect housing values the most.
You also choose to show multiple relationships together, graphically, by using a correlogram. With a correlogram here, corrgram you can display a multivariate chart of correlations, showing not only the direction but the magnitude as well!
(There is one quirk in using correlograms within R: it does not handle missing values well. In order to solve this problem, you whittle down the dataset by dropping missing values for the variables of interest, median housing values.)
What do you discover?
ACSData9_14nona2 <-ACSData9_14nona[c("MedHouseVal",
"Perc_18Plus",
"Perc_65Plus",
"MedAge",
"Perc_Under18",
"HHInc_Perc35to50K",
"HHInc_Perc35to50K_Owner",
"HHInc_Perc50to75K",
"HHInc_Perc50to75K_Owner",
"HHInc_Perc75to100K",
"HHInc_Perc75to100K_Owner",
"HHInc_PercUnder35K",
"HHInc_Own_PercUnder35K",
"HHInc_Perc35Kto100K",
"HHInc_Own_Perc35Kto100K",
"HHInc_PercOver100K",
"HHInc_Own_PercOver100K",
"HHs_Perc_MarriedCplFam",
"MedianAge",
"PercNotHSGrad",
"PercGradorProfDegree",
"IndInc_Perc25to35K",
"IndInc_Perc35to50K",
"IndInc_Perc50to65K",
"IndInc_Perc75KPlus",
"IndInc_Median",
"PercBAorMore",
"IndInc_PercUpto25K",
"IndInc_Perc25Kto75K",
"IndInc_Perc75KPlus",
"PercUnder150PercPov",
"Perc_Rented",
"MeanHrsWkd",
"SchlDistRnkLYr")]
library(corrgram)
corrgram(ACSData9_14nona2, order=NULL,
lower.panel=panel.shade,
upper.panel=NULL,
text.panel=panel.txt,
main="Correlogram of data")
Is it true that income can help drive neighborhood housing? You are curious to focus on housing owners rather than just the neighborhood at large, figuring this to be a better representation of an individual’s choice to purchase.
p1 <- ggplot(aes(x = HHInc_Own_PercUnder35K,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Owners with Income below $35K", y="Median House Value")
p1 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_PercUnder35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5633437
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_PercUnder35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5181778
You begin to notice a fairly modest negative relationship between the percent of owners earning below $35K and the neighborhood’s median house value (r = -0.469). You might expect this, given that owners with such income are unlikely to afford a high-value house.
What about ‘middle-class’ incomes and housing values, however?
p2 <-ggplot(aes(x = HHInc_Own_Perc35Kto100K,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Owners with Income from $35K to $100K", y="Median House Value")
p2 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35Kto100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.6376349
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_Perc35Kto100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.7471155
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35to50K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5674761
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35to50K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5849582
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc50to75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5021643
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc50to75K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5906898
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc75to100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.05733788
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc75to100K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.2464598
As you search even further, you notice a strong - but negative - relationship between mid-range incomes and median house values. This holds true when considering household incomes from everyone or simply ownders (r = -0.638 and -0.747, respectively). This remains true when drilling down to the $35 to $50K range as well as the $50K to $75K range (r from -.50 to -.58) but much less so for the $75K to $100K range (r from -0.05 to -0.24).
If it is true that a mortgage should not reach much further beyond 35% of one’s gross income, let’s take an example. Someone with an annual income of $70K can probably afford a $24K annual mortgage, which can translate to roughly $480K over 20 years (not adjusted for inflation). Thus, one can stretch far but only so far into the housing market.
You are curious and want to go further up the income ladder. Is there a relationship between higher-income owners and the houses they choose?
p3 <-ggplot(aes(x = HHInc_Own_PercOver100K,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Owners with Income at Least $100K", y="Median House Value")
p3 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_PercOver100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7624962
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_PercOver100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.8292535
Household income levels can be important at the higher levels. There is a strong positive relationship between a neighborhood’s individual income and housing values, as well as house-owner households earning at least $100K (r = .718 and .829, respectively). If the results are to be believed, high owner-income neighborhoods tend to have higher housing values.
You also want to explore whether individual - rather than household - income helps explain housing values. You investigate median neighborhood incomes as well as income brackets.
p33 <-ggplot(aes(x = IndInc_Median,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Population with Individual Income from $25K to $75K", y="Median House Value")
p33 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Median, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.718012
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc25to35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5122076
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc35to50K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3774294
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc50to65K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.05571146
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc65to75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2038175
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc75KPlus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.8224486
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_PercUpto25K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5401895
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc25Kto75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3553317
There is a strong positive relationship between a neighborhood’s median individual income and housing values (r = .718). As you see above, the relationship varies when drilling down to income categories. The relationship is strongest when considering high-income saturated areas (percentage earning $75K or more; r = .822) and lowest when exploring either the percent $50 to $65K or $65K to $75K (r = .056 and .204, respectively).
You wonder whether the overall neighborhood’s demographics have an impact on local housing. It sounds right, no? One possibility is that stable families help undergird neighborhoods, which can thus help maintain or improve housing values. You also explore whether the size of a family can affect housing.
p4 <-ggplot(aes(x = HHs_Perc_MarriedCplFam,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Households as Married-Couple Families", y="Median House Value")
p4 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_Perc_MarriedCplFam, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4211943
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_AvgFamilySize, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3546878
You begin to consider this, as there appears a somewhat strong positive connection between the percentage of married-couple families in an area and the median house value (r = .4212). You suspect part of this can be due to potential unmarried owners, who are not represented here. Perhaps there is a sufficient percentage of umarried houseowners to affect the results.
You find, surprisingly, an inverse relationship between family size and housing values (r = -0.3603). Of course, the results are at the neighborhood (not individual) value, which might account for this.
You start to consider the correlation between adult education and housing. It is easy to assume such a link, so you explore this in the data. You also run correlations on education groups.
p6 <-ggplot(aes(x = PercBAorMore,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of People with at Least Bachelor's Degree", y="Median House Value")
p6 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercBAorMore, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7704379
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercNotHSGrad, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5348605
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercHSGradOnly, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.6904984
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercSomeCollegetoAA, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3964668
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercHSGradorMore, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.5348381
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercBADegreeOnly, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.6720215
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercGradorProfDegree, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7952133
As discovered, the education level of adults in a neighborhood matters. The strongest findings link college success with housing: neighborhoods with a greater concentration of college graduates (and higher) tend to have significantly more expensive housing areas (r = .77), something even higher for graduate/professional degrees (r = .795). Conversely, areas with higher rates of high school dropouts tend to have significantly lower housing values (r = -.534). In fact, in any category exploring the percentage of people who do NOT complete (at least) a college degree, the relationship is negative. It seems the average earning power of a college degree is, in fact, important!
Is it true that poverty might be driving housing values? You investigate whether poverty and near-poverty rates are influential.
p7 <-ggplot(aes(x = PercUnder150PercPov,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of People Living Below 150 Percent of Poverty Level", y="Median House Value")
p7 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercUnder150PercPov, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5517491
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercUnderPov, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.4695731
You notice a link between poverty rates and housing values. Areas that tend to have slightly-above-poverty - rates tend to have lower housing values (r = -.552), and a little less so for poverty rates (r = -.470).
Perhaps the ‘age’ of the houses can have an impact on housing values. You wonder, that is, whether people tend to value ‘older’ houses less than their newer - and presumably sturdier - counterparts. Is it true that, as houses age, they decrease in relative value? Is it also true that neighborhoods with ‘larger’ houses (multi-unit structures) tend to have higher-value houses?
p8 <-ggplot(aes(x = MedianHouseYr,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Median Year House Built", y="Median House Value")
p8 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$MedianHouseYr, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2378942
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_Total_UnitsinStructure_PercTwoPlusUnitStructure, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.191573
Do newer houses tend to receive higher appreciation values? The results here suggest otherwise, as there does not appear to be a strong pattern in the data (r = .238). Of course, it can be the case that owners can add material value to their older houses (e.g. house additions, a pool, a larger garage).
You discover little link between the percentage of multi-unit houses and housing values (r = -0.1916).
You are also curious to see whether renting or owning has any impact on housing values, and whether vacancies tend to depress housing values.
p10 <-ggplot(aes(x = Perc_Rented,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Percent of Units Rented [not Owned]", y="Median House Value")
p10 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_Rented, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3403892
cor(ACSData9_14$MedHouseVal, ACSData9_14$Housing_Perc_Vacant, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.04364004
The results appear to be modest here, whether the percentage of renting or vacancies (r = -.340 and -0.0436, respectively).
You want to see whether the Great Recession had any long-lasting impact on neighborhood housing. You consider whether the average number of hours - as a proxy for employment status - is linked to housing values.
p11 <-ggplot(aes(x = MeanHrsWkd,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Person's Average Usual Hours Worked", y="Median House Value")
p11 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$MeanHrsWkd, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2632697
The results suggest otherwise, with a small correlation at best (r = 0.263). Of course, it can be partly a function of the survey, since the types (and payscales) of work people pursue can vary greatly. Alternately, it can also be considered a crude measure of the economic ‘health’ of a community.
Last, with all the talk about child education driving people’s housing decisions, you want to see whether this holds any water.
p12 <-ggplot(aes(x = SchlDistRnkLYr, y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="School District Ranking, Prior Year", y="Median House Value")
p13 <- p12 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
p13 + geom_jitter(width=.08)
cor(ACSData9_14$MedHouseVal, ACSData9_14$SchlDistRnkLYr, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3654633
cor(ACSData9_14$MedHouseVal, ACSData9_14$SchlDistRnk2YrAgo, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3490175
What you find is a relatively modest correlation between a school district’s performance and housing values at best(r = .3655). This is a bit surprising, given all the talk about schools and educational performance. Of course, you are not drilling down into individual schools and you are not tracking individual moves within the metropolitan area. Perhaps there is a stronger link to individual schools instead of the district. You decide that will be for a different day.
While it is possible to point to the importance of private schools, you know that the national average of private school attendance is only around 10 percent and Denver is not among the top 10 percent of large metropolitan areas for private school attendance. (Source)
p14 <-ggplot(aes(x = MedAge,
y = MedHouseVal), data = ACSData9_14) +
geom_point(alpha = .4) +
labs(x="Neighborhood Median Age", y="Median House Value")
p14 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_18Plus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.1417501
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_65Plus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.1692704
cor(ACSData9_14$MedHouseVal, ACSData9_14$MedAge, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4616155
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_Under18, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.1417501
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_20to29, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3049146
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_30to39, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.2990187
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_40to49, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3050462
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_50to59, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3896994
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_60to69, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4083626
Most relationships are modest, the strongest being median age (r = .4616). Some are negative, such as the percentage of people 20 to 29 or 30 to 39 (r = -0.3049 and -.2990, respectively). Perhaps it is true that not many people in their twenties purchase a house quickly, but it does not explain people in their thirties.
Okay, so perhaps you - like many others - hate staring at graphs. It will make more sense to create visual maps instead. In this way, you can see how neighborhoods can vary - sometimes drastically! After all, visual maps are a powerful way to tell a data story, no?
Since you have been thinking about the downtown and immediate southern suburbs first, you have decided to narrow the maps thus. A more interactive version of various variables is available separately as a Shiny app.
Creating these types of maps takes two basic steps. First, you will use a simple yet accurate starting map to represent the Denver metropolitan area. For this stage of the project, Google’s maps are sufficient.
Second, you collect some geographic files to represent the boundaries of each census tract. The Census Bureau provides shapefiles for just such an occasion, providing polygon shapes.
Third, in order to represent the variables of interest, you merge the shapefiles into your dataset.
There are a few necessary technical steps in order to use the shapefiles. You must read the GEOID variable as a character vector, and then fortify them.
# Start mapping
# Define map source type and color
DenverMetro <- c(-106, 40.3, -104, 39.1)
DenverMetroMap <- get_map(location=DenverMetro, source = "google",
maptype = "roadmap", zoom = 11, crop=FALSE)
# Add polygons from tract file
# https://www.census.gov/geo/maps-data/data/cbf/cbf_tracts.html
setwd('C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data')
getwd()
## [1] "C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data"
tract <- readOGR(dsn=".", layer = "cb_2014_08_tract_500k")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "cb_2014_08_tract_500k"
## with 1249 features
## It has 9 fields
tract@data$GEOID<-as.character(tract@data$GEOID)
# convert polygons to data.frame
Denver_tract<-fortify(tract, region = "GEOID")
str(Denver_tract)
## 'data.frame': 85865 obs. of 7 variables:
## $ long : num -105 -105 -105 -105 -105 ...
## $ lat : num 39.7 39.7 39.7 39.7 39.7 ...
## $ order: int 1 2 3 4 5 6 7 8 9 10 ...
## $ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ piece: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ id : chr "08001007801" "08001007801" "08001007801" "08001007801" ...
## $ group: Factor w/ 1276 levels "08001007801.1",..: 1 1 1 1 1 1 1 1 2 2 ...
Denver_tract$id <-substring(Denver_tract$id, 2) # id had extra 0 to left
str(Denver_tract$id)
## chr [1:85865] "8001007801" "8001007801" "8001007801" ...
library(reshape)
Denver_tract <-rename(Denver_tract, c('id'='id2'))
##########################################
#
# Join polygons to housing value data
#
Denver_tract$id <-as.character(Denver_tract$id)
ACSData9_14nona$id <-as.character(ACSData9_14nona$id)
ACSData10_14 <-left_join(Denver_tract, ACSData9_14nona, by=c('id'))
save(ACSData10_14, file = "ACSData10.RData")
###########################
#
# Plot median housing value
#
describe(ACSData9_14$MedHouseVal, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range
## 1 1 578 272222.5 137976.3 243100 254655.6 109564.1 27500 1e+06 972500
## skew kurtosis se
## 1 1.83 5.53 5739.06
quantile(ACSData9_14$MedHouseVal, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 396812.5 330950.0 281012.5 243100.0 215150.0 181900.0 149325.0
# Plot median housing values
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = MedHouseVal),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .3968125, .330950, .2810125, .243100, .215150, .181900, .149325, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Median House Value by Census Tract, 2014')
#########################
#
# Plot Low-income households
#
describe(ACSData9_14$HHInc_Own_PercUnder35K, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 580 17.22 10.79 14.65 16 9.41 0 100 100 1.7 6.64
## se
## 1 0.45
quantile(ACSData9_14$HHInc_Own_PercUnder35K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 29.6625 22.2250 18.5000 14.6500 11.9125 9.3000 6.8375
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_PercUnder35K), data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .184, .14, .106, .0895, .071, .05, .031, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income Below $35K')
##########################################
#
# Plot middle-income values
#
describe(ACSData9_14$HHInc_Own_Perc35Kto100K, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 580 45.5 13.56 46.65 45.98 14.08 0 79.8 79.8 -0.37 0
## se
## 1 0.56
quantile(ACSData9_14$HHInc_Own_Perc35Kto100K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 60.400 55.100 51.000 46.650 42.600 36.200 28.875
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_Perc35Kto100K), data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .548, .4805, .41675, .355, .313, .255, .189375, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income from $35K to $100K')
##########################################
#
# Plot upper-income values
#
describe(ACSData9_14$HHInc_Own_PercOver100K, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 580 37.28 19.15 36.1 36.7 21.65 0 100 100 0.25 -0.7
## se
## 1 0.79
quantile(ACSData9_14$HHInc_Own_PercOver100K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 61.4875 51.5000 44.4750 36.1000 28.7125 21.6000 14.2125
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_PercOver100K),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .748, .68, .607875, .545, .47025, .372, .289375, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income Above $100K')
##########################################
#
# Plot married-couple values
#
describe(ACSData9_14$HHs_Perc_MarriedCplFam, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 583 48.75 18.15 47.4 48.73 19.72 5.6 90.3 84.7 0.04 -0.68
## se
## 1 0.75
quantile(ACSData9_14$HHs_Perc_MarriedCplFam, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 71.750 62.650 54.550 47.400 42.025 35.500 27.800
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = HHs_Perc_MarriedCplFam),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(100, .7175, .6265, .5455, .474, .42025, .355, .278, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Married Couple Families, 2014')
##########################################
#
# Plot percent with college degree values
#
describe(ACSData9_14$PercBAorMore, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 584 39.48 19.84 39.4 39.31 23.8 2.8 86.5 83.7 0.07 -0.97
## se
## 1 0.82
quantile(ACSData9_14$PercBAorMore, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 63.5125 55.2000 47.5000 39.4000 31.3625 23.1750 13.5000
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = PercBAorMore),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .635125, .552, .475, .394, .313625, .23175, .135, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent of People with A College Degree or More')
##########################################
#
# Plot percent of people below 150 percent poverty level values
#
describe(ACSData9_14$PercUnder150PercPov, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 583 20.22 15.04 15.7 18.5 13.94 0.3 92.4 92.1 0.99 0.66
## se
## 1 0.62
quantile(ACSData9_14$PercUnder150PercPov, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 39.800 30.350 21.700 15.700 11.225 8.200 5.200
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = PercUnder150PercPov),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .398, .3035, .217, .157, .11225, .082, .052, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Living Below 150 Perent Poverty Level')
####################################
#
# Plot percent vacant data
#
describe(ACSData9_14$Housing_Perc_Vacant, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 583 5.28 3.91 4.8 4.94 3.56 0 33 33 1.64 6.89 0.16
quantile(ACSData9_14$Housing_Perc_Vacant, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 9.425 7.300 6.000 4.800 3.600 2.550 1.200
# Plot percent vacant values
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = Housing_Perc_Vacant),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .09425, .073, .06, .048, .036, .0255, .012, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Housing Units Vacant by Census Tract, 2014')
####################################
#
# Plot renter data
describe(ACSData9_14$Perc_Rented, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 583 35.46 23.71 31.19 33.54 26.71 0 100 100 0.58 -0.5
## se
## 1 0.98
quantile(ACSData9_14$Perc_Rented, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 65.697053 51.911798 41.133582 31.193581 23.761539 15.011655 7.952541
# Plot percent renter built values
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = Perc_Rented),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .65697053, .51911798, .41133582, .31193581, .23761539, .15011655, .07952541, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Housing Units Rented by Census Tract, 2014')
####################################
#
# Plot school district rank data
describe(ACSData9_14$SchlDistRnkLYr, na.rm = TRUE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 587 0.48 0.21 0.56 0.48 0.3 0.09 0.77 0.68 -0.07 -1.4
## se
## 1 0.01
quantile(ACSData9_14$SchlDistRnkLYr, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5% 75% 62.5% 50% 37.5% 25% 12.5%
## 0.763 0.682 0.590 0.558 0.294 0.294 0.205
ggmap(DenverMetroMap) +
geom_polygon(aes(x = long, y = lat, group=id, fill = SchlDistRnkLYr),
data = ACSData10_14, color="black", alpha = .4, size = .2) +
scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
values = c(1, .763, .682, .59, .59, .558, .426, .294, 0)) +
labs(x = 'Longitude', y = 'Latitude') + ggtitle('School District State Rank by Census Tract, 2013')